Audio-visual person identification on the XM2VTS database
نویسندگان
چکیده
This paper presents a multimodal person identification system based on combination of audio and visual classifiers. The audio classifier was built by using mel-frequency cepstrum coefficient features and Gaussian mixture models. The visual classifier was implemented by Haar-like features and AdaBoost algorithm for face detection, and principal component analysis for identification. A new method is proposed to estimate the optimal weighting parameter based on probability density function estimation under Gaussian assumptions. Simulations indicate that the proposed method obtains slightly better results than the frequently-used empirical method of optimising on held-out training data.
منابع مشابه
Audio-visual speaker identification using coupled hidden Markov models
In this paper, we investigate the use of the coupled hidden Markov models (CHMM) for the task of audio-visual text dependent speaker identification. Our system determines the identity of the user from a temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth, respectively. The multi modal observation sequences are then modeled using a set ...
متن کاملRobust Automatic Human Identification Using Face, Mouth, and Acoustic Information
Discriminatory information about person identity is multimodal. Yet, most person recognition systems are unimodal, e.g. the use of facial appearance. With a view to exploiting the complementary nature of different modes of information and increasing pattern recognition robustness to test signal degradation, we developed a multiple expert biometric person identification system that combines info...
متن کاملBuilding Video Databases to Boost Performance Quantification – The DXM2VTS Database
Building a biometric database is an expensive task which requires high level of cooperation from a large number of participants. Currently, despite increased demand for large multimodal databases, there are only a few available. The XM2VTS database is one of the most utilized audio-video databases in the research community although it has been increasingly revealed that it cannot quantify perfo...
متن کاملAudio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities
An audio-visual speaker identification system is described, where the audio and visual speech modalities are fused by an automatic unsupervised process that adapts to local classifier performance, by taking into account the output score based reliability estimates of both modalities. Previously reported methods do not consider that both the audio and the visual modalities can be degraded. The v...
متن کاملSpeaker and Speech recognition by Audio-Visual lip biometrics
This paper proposes a new robust bi-modal audio visual speech and speaker recognition system by lip-motion and speech biometrics. To increase the robustness of speech and speaker recognition, we have proposed a method using speaker lip motion information extracted from video sequences with low resolution (128 ×128 pixels). In this paper we investigate a biometric system for speech recognition a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007